流量控制
流量分配
- Firas Abuzaid, Srikanth Kandula, Behnaz Arzani, Ishai Menache, Matei Zaharia, and Peter Bailis. Contracting wide-area network topologies to solve flow problems quickly. Proc. of NSDI: 175–200. 2021.
- 发表处: NSDI
- 参与公司: Microsoft
- 推荐阅读
- 摘要: Many enterprises today manage traffic on their wide-area networks using software-defined traffic engineering schemes, which scale poorly with network size; the solver runtimes and number of forwarding entries needed at switches increase to untenable levels. We describe a novel method, which, instead of solving a multi-commodity flow problem on the network, solves (1) a simpler problem on a contraction of the network, and (2) a set of sub-problems in parallel on disjoint clusters within the network. Our results on the topology and demands from a large enterprise, as well as on publicly available topologies, show that, in the median case, our method nearly matches the solution quality of currently deployed solutions, but is 8× faster and requires 6× fewer FIB entries. We also show the value-add from using a faster solver to track changing demands and to react to faults.
- Guoming Tang, Huan Wang, Kui Wu, and Deke Guo. Tapping the Knowledge of Dynamic Traffic Demands for Optimal CDN Design. IEEE/ACM Transactions on Networking 27, 1: 98–111. 2019.
- 发表处: CoNEXT
- 参与公司: 无
- 推荐阅读
- 摘要: The content delivery network (CDN) intensively uses cache to push the content close to end users. Over both traditional Internet architecture and emerging cloud-based framework, cache allocation has been the core problem that any CDN operator needs to address. As the first step for cache deployment, CDN operators need to discover or estimate the distribution of user requests in different geographic areas. This step results in a statistical spatial model for the user requests, which is used as the key input to solve the optimal cache deployment problem. More often than not, the temporal information in user requests is omitted to simplify the CDN design. In this paper, we disclose that the spatial request model alone may not lead to truly optimal cache deployment and revisit the problem by taking the dynamic traffic demands into consideration. Specifically, we model the time-varying traffic demands and formulate the distributed cache deployment optimization problem with an integer linear program (ILP). To solve the problem efficiently, we transform the ILP problem into a scalable form and propose a greedy diagram to tackle it. Via experiments over the North American ISPs points of presence (PoPs) network, our new solution outperforms traditional CDN design method and saves the overall delivery cost by 16% to 20%. We also study the impact of various traffic demand patterns to the CDN design cost, via experiments with both real-world traffic demand patterns and extensive synthetic trace data.
- Zhihao Li, Neil Spring, Dave Levin, and Bobby Bhattacharjee. Internet anycast: Performance, problems, & potential. Proc. of SIGCOMM: 59–73. 2018.
- 发表处: SIGCOMM
- 参与公司: 无
- 推荐阅读
- 摘要: Internet anycast depends on inter-domain routing to direct clients to their “closest” sites. Using data collected from a root DNS server for over a year (400M+ queries/day from 100+ sites), we characterize the load balancing and latency performance of global anycast. Our analysis shows that site loads are often unbalanced, and that most queries travel longer than necessary, many by over 5000 km. Investigating the root causes of these inefficiencies, we can attribute path inflation to two causes. Like unicast, anycast routes are subject to inter-domain routing topology and policies that can increase path length compared to theoretical shortest (e.g., great-circle distance). Unlike unicast, anycast routes are also affected by poor route selection when paths to multiple sites are available, subjecting anycast routes to an additional, unnecessary, penalty. Unfortunately, BGP provides no information about the number or goodness of reachable anycast sites. We propose an additional hint in BGP advertisements for anycast routes that can enable ISPs to make better choices when multiple “equally good” routes are available. Our results show that use of such routing hints can eliminate much of the anycast path inflation, enabling anycast to approach the performance of unicast routing.
- Shihan Xiao, Haiyan Mao, Bo Wu, Wenjie Liu, and Fenglin Li. Neural packet routing. Proc. of NetAI, SIGCOMM Workshop: 28–34. 2020.
- 发表处: SIGCOMM Wrokshop
- 参与公司: 华为(无高校作者)
- 推荐阅读
- 摘要: Deep learning has shown great potential in automatically generating routing protocols for different optimization objectives. Although it may bring superior performance gains, there exists a fundamental obstacle to prevent existing network operators from deploying it into a real-world network, i.e., the uncertainty of statistical nature in deep learning can not provide the certainty of basic connectivity guarantee required in real-world routing. In this paper, we propose the first deep-learning-based distributed routing system (named NGR) that can achieve the connectivity guarantee while still attaining the routing optimality. NGR provides a novel packet routing framework based on the link reversal theory. Specially-designed neural network structures are further proposed to seamlessly incorporate into the framework. We apply NGR in the tasks of shortest-path routing and load balancing. The evaluation results validate that NGR can achieve 100% connectivity guarantee despite the uncertainty of deep learning and gain performance close to the optimal solution.
- Praveen Kumar, Yang Yuan, Chris Yu, et al. Semi-Oblivious Traffic Engineering : The Road Not Taken. Proc. of NSDI. 2018.
- 发表处: NSDI
- 参与公司: Facebook
- 推荐阅读
- 摘要: Networks are expected to provide reliable performance under a wide range of operating conditions, but existing traffic engineering (TE) solutions optimize for performance or robustness, but not both. A key factor that impacts the quality of a TE system is the set of paths used to carry traffic. Some systems rely on shortest paths, which leads to excessive congestion in topologies with bottleneck links, while others use paths that minimize congestion, which are brittle and prone to failure. This paper presents a system that uses a set of paths computed using Räcke’s oblivious routing algorithm, as well as a centralized controller to dynamically adapt sending rates. Although oblivious routing and centralized TE have been studied previously in isolation, their combination is novel and powerful. We built a software framework to model TE solutions and conducted extensive experiments across a large number of topologies and scenarios, including the production backbone of a large content provider and an ISP. Our results show that semi-oblivious routing provides near-optimal performance and is far more robust than state-of-the-art systems.
- Yuan Zuo, Yulei Wu, Geyong Min, and Laizhong Cui. Learning-based network path planning for traffic engineering. Future Generation Computer Systems 92: 59–67. 2019.
- 发表处: Future Generation Computer Systems
- 参与公司: 无
- 摘要: Recent advances in traffic engineering offer a series of techniques to address the network problems due to the explosive growth of Internet traffic. In traffic engineering, dynamic path planning is essential for prevalent applications, e.g., load balancing, traffic monitoring and firewall. Application-specific methods can indeed improve the network performance but can hardly be extended to general scenarios. Meanwhile, massive data generated in the current Internet has not been fully exploited, which may convey much valuable knowledge and information to facilitate traffic engineering. In this paper, we propose a learning-based network path planning method under forwarding constraints for finer-grained and effective traffic engineering. We form the path planning problem as the problem of inferring a sequence of nodes in a network path and adapt a sequence-to-sequence model to learn implicit forwarding paths based on empirical network traffic data. To boost the model performance, attention mechanism and beam search are adapted to capture the essential sequential features of the nodes in a path and guarantee the path connectivity. To validate the effectiveness of the derived model, we implement it in Mininet emulator environment and leverage the traffic data generated by both a real-world GEANT network topology and a grid network topology to train and evaluate the model. Experiment results exhibit a high testing accuracy and imply the superiority of our proposal.
- Ming Li, Yuewen Wang, Zhaowen Wang, and Huiying Zheng. A deep learning method based on an attention mechanism for wireless network traffic prediction. Ad Hoc Networks 107. 2020.
- 发表处: Ad Hoc Networks
- 参与公司: 无
- 推荐阅读
- 摘要: With the rapid development of wireless networks, the self-management and active adjustment capabilities of base stations have become crucial. The accurate prediction of wireless network traffic is an important prerequisite for intelligent base stations. Traffic data has a high degree of nonlinearity and complexity, which is characterized by temporal and spatial correlation. Most of the existing forecasting methods do not consider both the temporal and spatial situations in the process of modeling traffic data. In this paper, a spatio-temporal convolutional network (LA-ResNet) is presented that uses an attention mechanism to solve spatio-temporal modeling and predict wireless network traffic. LA-ResNet consists of three parts: the residual network, the recurrent neural network, and an attention mechanism. Using this method, the temporal and spatial characteristics of wireless network traffic data are modeled and its related features are strengthened. Thus, the spatio-temporal correlation of wireless network traffic data can be effectively captured. The residual network can capture spatial features in the data. The combination of the recurrent neural network and the attention mechanism can capture the temporal dependence of the data. Finally, experiments on a real data set show that the prediction effect of the LA-ResNet model is better than the other existing prediction methods, such as RNN and 3DCNN, and the accurate prediction of traffic can be realized.
- Abdullah Bin Faisal, Hafiz Mohsin Bashir, Ihsan Ayyub Qazi, Zartash Uzmi, and Fahad R Dogar. Workload adaptive flow scheduling. Proc. of CoNEXT: 241–253. 2018.
- 发表处: CoNEXT
- 参与公司: 无
- 推荐阅读
- 摘要: Existing flow scheduling schemes for data center networks optimize for a specific workload and performance metric. In this paper, we present 2D, a new scheduling policy that offers robustness across performance metrics and changing workloads – a ground existing scheduling policies are unable to cover. 2D combines basic scheduling building blocks of multiplexing and serialization in a principled way, ensuring tail optimal performance across workloads while also improving the average (and lower percentiles) completion times. To implement 2D for flow-level scheduling in a distributed setting, we break-up the scheduling decision into two parts: coarse time-scale decisions based on workload and load changes are made by a centralized controller while per-flow serialization decisions are made in a distributed fashion, involving the end-points and sequencer(s). Our testbed experiments show that, for realistic cloud workloads, 2D provides consistent gains at the tail and average flow completion times compared to basic scheduling techniques (e.g., FIFO and processor sharing) as well as heuristic-based schedulers (e.g., Aalo and Baraat).
数据中心之间调度
- Sepehr Abbasi-Zadeh, Mohmmad Amin Beiruti, Yashar Ganjali, and Zhenhua Hu. Application-Aware Load Migration Protocols for Network Controllers. Proc. of ICNP, Poster 2020-Octob, Figure 2: 2–3. 2020.
- 发表处: ICNP Poster
- 参与公司: 华为
- 摘要: Load migration protocols have been used for load balancing in network controllers. In this poster, we argue that other network applications (e.g., power saving, network security, failure recovery, etc.) have properties that might require different load migration protocols. We introduce four new load migration protocols and show how they might match different application requirements better. We present preliminary experimental results for one of these protocols that show more than 20% − 30% speedup in the total load migration time.
- Sepehr Abbasi-Zadeh, Mohmmad Amin Beiruti, Yashar Ganjali, and Zhenhua Hu. Fast Scheduling for Load Migration in Distributed Network Controllers. Proc. of ICNP, Poster 2020-Octob: 2020–2021. 2020.
- 发表处: ICNP Poster
- 参与公司: 华为
- 推荐阅读
- 摘要: As network traffic and conditions change, the load on different instances of control plane changes. To ensure various control applications can operate continuously and efficiently, we need to migrate the load among controller instances. For this, we need a migration schedule that minimizes the overall migration time while ensuring the quality of service and controller resource constraints. In this poster, we show this problem is NP-hard, and show how a heuristic algorithm performs close to the best existing solution with orders of magnitude reduction in scheduling time.
故障切流
- Usama Naseer, Luca Niccolini, Udip Pant, Alan Frindell, Ranjeeth Dasineni, and Theophilus A. Benson. Zero Downtime Release: Disruption-free Load Balancing of a Multi-Billion User Website. Proc. of SIGCOMM: 529–541. 2020.
- 发表处: SIGCOMM
- 参与公司: Facebook
- 推荐阅读
- 摘要: Modernnetworkinfrastructurehasevolvedintoacomplexorganism to satisfy the performance and availability requirements for the billions of users. Frequent releases such as code upgrades, bug fixes and securityupdateshavebecomeanorm.Millionsofgloballydistributed infrastructure components including servers and load-balancers are restarted frequently from multiple times per-day to per-week. However, every release brings possibilities of disruptions as it can result in reduced cluster capacity, disturb intricate interaction of the components operating at large scales and disrupt the end-users by terminating their connections. The challenge is further complicated by the scale and heterogeneity of supported services and protocols. In this paper, we leverage different components of the end-to-end networking infrastructure to prevent or mask any disruptions in face of releases. Zero Downtime Release is a collection of mechanisms used at Facebook to shield the end-users from any disruptions, preserve the cluster capacity and robustness of the infrastructure when updates are released globally. Our evaluation shows that these mechanisms prevent any significant cluster capacity degradation when a considerable number of productions servers and proxies are restartedandminimizesthedisruptionfordifferentservices(notably TCP, HTTP and publish/subscribe).
DNS配置
- Casey Deccio and Jacob Davis. DNS privacy in practice and preparation. Proc. of CoNEXT: 138–143. 2019.
- 发表处: CoNEXT
- 参与公司: Sandia National Laboratories
- 摘要: An increased demand for privacy in Internet communications has resulted in privacy-centric enhancements to the Domain Name System (DNS), including the use of Transport Layer Security (TLS) and Hypertext Transfer Protocol Secure (HTTPS) for DNS queries. In this paper, we seek to answer questions about their deployment, including their prevalence and their characteristics. Our work includes an analysis of DNS-over-TLS (DoT) and DNS-over-HTTPS (DoH) availability at open resolvers and authoritative DNS servers. We find that DoT and DoH services exist on just a fraction of open resolvers, but among them are the major vendors of public DNS services. We also analyze the state of TCP Fast Open (TFO), which is considered key to reducing the latency associated with TCP-based DNS queries, required by DoT and DoH. The uptake of TFO is extremely low, both on the server side and the client side, and it must be improved to avoid performance degradation with continued adoption of DNS Privacy enhancements.
- Abhishta Abhishta, Roland Van Rijswijk-Deij, and Lambert J.M. Nieuwenhuis. Measuring the impact of a successful DDoS attack on the customer behaviour of managed DNS service providers. Proc. of WTMC 48, 5: 70–76. 2018.
- 发表处: WTMC
- 参与公司: 无
- 摘要: Distributed Denial-of-Service (DDoS) attacks continue to pose a serious threat to the availability of Internet services. The Domain Name System (DNS) is part of the core of the Internet and a crucial factor in the successful delivery of Internet services. Because of the importance of DNS, specialist service providers have sprung up in the market, that provide managed DNS services. One of their key selling points is that they protect DNS for a domain against DDoS attacks. But what if such a service becomes the target of a DDoS attack, and that attack succeeds? In this paper we analyse two such events, an attack on NS1 in May 2016, and an attack on Dyn in October 2016. We do this by analysing the change in the behaviour of the service’s customers. For our analysis we leverage data from the OpenINTEL active DNS measurement system, which covers large parts of the global DNS over time. Our results show an almost immediate and statistically significant change in the behaviour of domains that use NS1 or Dyn as a DNS service provider. We observe a decline in the number of domains that exclusively use NS1 or Dyn as a managed DNS service provider, and see a shift toward risk spreading by using multiple providers. While a large managed DNS provider may be better equipped to protect against attacks, these two case studies show they are not impervious to them. This calls into question the wisdom of using a single provider for managed DNS. Our results show that spreading risk by using multiple providers is an effective countermeasure, albeit probably at a higher cost.
拨测
- Xinlei Yang, Xianlong Wang, Zhenhua Li, and Yunhao Liu. Fast and Light Bandwidth Testing for Internet Users. Proc. of NSDI. 2021.
- 发表处: NSDI
- 参与公司: 阿里巴巴
- 推荐阅读
- 摘要: Bandwidth testing measures the access bandwidth of end hosts, which is crucial to emerging Internet applications for network-aware content delivery. However, today’s bandwidth testing services (BTSes) are slow and costly—the tests take a long time to run, consume excessive data usage at the client side, and/or require large-scale test server deployments. The inefficiency and high cost of BTSes root in their methodologies that use excessive temporal and spatial redundancies for combating noises in Internet measurement. This paper presents FastBTS to make BTS fast and cheap while maintaining high accuracy. The key idea of FastBTS is to accommodate and exploit the noise rather than repetitively and exhaustively suppress the impact of noise. This is achieved by a novel statistical sampling framework (termed fuzzy rejection sampling). We build FastBTS as an end-to-end BTS that implements fuzzy rejection sampling based on elastic bandwidth probing and denoised sampling from high-fidelity windows, together with server selection and multihoming support. Our evaluation shows that with only 30 test servers, FastBTS achieves the same level of accuracy compared to the state-of-the-art BTS (SpeedTest.net) that deploys 12,000 servers. Most importantly, FastBTS makes bandwidth tests 5.6⇥ faster and 10.7 more data-efficient.
- Yikai Zhao, Kaicheng Yang, Zirui Liu, et al. Light guardian: A full-visibility, lightweight, in-band telemetry system using sketchlets. Proc. of NSDI: 991–1007. 2021.
- 发表处: NSDI
- 参与公司: 华为
- 推荐阅读
- 摘要: Network traffic measurement is central to successful network operations, especially for today’s hyper-scale networks. Although existing works have made great contributions, they fail to achieve the following three criteria simultaneously: 1) full-visibility, which refers to the ability to acquire any desired per-hop flow-level information for all flows; 2) low overhead in terms of computation, memory, and bandwidth; and 3) robustness, meaning the system can survive partial network failures. We design LightGuardian to meet these three criteria. Our key innovation is a (small) constant-sized data structure, called sketchlet, which can be embedded in packet headers. Specifically, we design a novel SuMax sketch to accurately capture flow-level information. SuMax can be divided into sketchlets, which are carried in-band by passing packets to the end-hosts for aggregation, reconstruction, and analysis. We have fully implemented a LightGuardian prototype on a testbed with 10 programmable switches and 8 end-hosts in a FatTree topology, and conduct extensive experiments and evaluations. Experimental results show that LightGuardian can obtain per-flow per-hop flow-level information within 1.0 ∼ 1.5 seconds with consistently low overhead, using only 0.07% total bandwidth capacity of the network. We believe LightGuardian is the first system to collect perflow per-hop information for all flows in the network with negligible overhead.
- Florian Wohlfart, Nikolaos Chatzis, Caglar Dabanoglu, Georg Carle, and Walter Willinger. Leveraging interconnections for performance:The serving infrastructure of a large CDN. Proc. of SIGCOMM: 206–220. 2018.
- 发表处: SIGCOMM
- 参与公司: Akamai, NIKSUN
- 推荐阅读
- 摘要: Today’s large content providers (CP) are busy building out their service infrastructures or łpeering edgesž to satisfy the insatiable demand for content created by an ever-expanding Internet edge. One component of these serving infrastructures that features prominently in this build-out is their connectivity fabric; i.e., the set of all Internet interconnections that content has to traverse en route from the CP’s various łdeploymentsž or łserving sitesž to end users. However, these connectivity fabrics have received little attention in the past and remain largely ill-understood. In this paper, we describe the results of an in-depth study of the connectivity fabric of Akamai. Our study reveals that Akamai’s connectivity fabric consists of some 6,100 different łexplicitž peerings (i.e., Akamai is one of the two involved peers) and about 28,500 different łimplicitž peerings (i.e., Akamai is neither of the two peers). Our work contributes to a better understanding of real-world serving infrastructures by providing an original account of implicit peerings and demonstrating the performance benefits that Akamai can reap from leveraging its rich connectivity fabric for serving its customers’ content to end users.
- Matt Calder, Microsoft Usc, Manuel Schröder, et al. Odin : Microsoft ’ s Scalable Fault-Tolerant CDN Measurement System. Proc. of NSDI. 2018.
- 发表处: NSDI
- 参与公司: Microsoft
- 推荐阅读
- 摘要: Content delivery networks (CDNs) are critical for delivering high performance Internet services. Using worldwide deployments of front-ends, CDNs can direct users to the front-end that provides them with the best latency and availability. The key challenges arise from the heterogeneous connectivity of clients and the dynamic nature of the Internet that influences latency and availability. Without continuous insight on performance between users, front-ends, and external networks, CDNs will not be able to attain their full potential performance. We describe Odin, Microsoft’s Internet measurement platform for its first-party and third-party customers. Odin is designed to handle Microsoft’s large user base and need for large-scale measurements from users around the world. Odin integrates with Microsoft’s varied suite of web-client and thick-client applications, all while being mindful of the regulatory and privacy concerns of enterprise customers. Odin has been operational for 2 years. We present the first detailed study of an Internet measurement platform of this scale and complexity.
- Kevin Vermeulen, Sorbonne Université, Justin P Rohrer, et al. Diamond-Miner : Comprehensive Discovery of the Internet ’ s Topology Diamonds. Proc. of NSDI February. 2020.
- 发表处: NSDI
- 参与公司: 无
- 推荐阅读
- 摘要: Despite the well-known existence of load-balanced forwarding paths in the Internet, current active topology Internet-wide mapping efforts are multipath agnostic – largely because of the probing volume and time required for existing multipath discovery techniques. This paper introduces D-Miner, a system that marries previous work on high-speed probing with multipath discovery to make Internet-wide topology mapping, inclusive of load-balanced paths, feasible. We deploy D-Miner and collect multiple IPv4 interface-level topology snapshots, where we find >64% more edges, and significantly more complex topologies relative to existing systems. We further scrutinize topological changes between snapshots and attribute forwarding differences not to routing or policy changes, but to load balancer “remapping” events. We precisely categorize remapping events and find that they are a much more frequent contributor of path changes than previously recognized. By making D-Miner and our collected Internet-wide topologies publicly available, we hope to help facilitate better understanding of the Internet’s true structure and resilience.
- Albert Mestres, Eduard Alarcón, Yusheng Ji, and Albert Cabellos-Aparicio. Understanding the modeling of computer network delays using neural networks. Proc. of Big-DAMA, SIGCOMM Wor: 46–52. 2018.
- 发表处: NSDI workshop
- 参与公司: NII, Japan
- 推荐阅读
- 摘要: Recent trends in networking are proposing the use of Machine Learning (ML) techniques for the control and operation of the network. In this context, ML can be used as a computer network modeling technique to build models that estimate the network performance. Indeed, network modeling is a central technique to many networking functions, for instance in the field of optimization, in which the model is used to search a configuration that satisfies the target policy. In this paper, we aim to provide an answer to the following question: Can neural networks accurately model the delay of a computer network as a function of the input traffic? For this, we assume the network as a black-box that has as input a traffic matrix and as output delays. Then we train different neural networks models and evaluate its accuracy under different fundamental network characteristics: topology, size, traffic intensity and routing. With this, we aim to have a better understanding of computer network modeling with neural nets and ultimately provide practical guidelines on how such models need to be trained.